Part:BBa_K3627010:Design
First heavy chain for SARS-CoV-2 antibody
- 10INCOMPATIBLE WITH RFC[10]Illegal PstI site found at 241
Illegal PstI site found at 529 - 12INCOMPATIBLE WITH RFC[12]Illegal PstI site found at 241
Illegal PstI site found at 529 - 21COMPATIBLE WITH RFC[21]
- 23INCOMPATIBLE WITH RFC[23]Illegal PstI site found at 241
Illegal PstI site found at 529 - 25INCOMPATIBLE WITH RFC[25]Illegal PstI site found at 241
Illegal PstI site found at 529
Illegal AgeI site found at 465 - 1000COMPATIBLE WITH RFC[1000]
Design Notes
This is the most ambitious aspect of our project, beginning with an already-researched antibody sequence that binds to the spike protein, the S309 neutralizing agent (PDB ID 6WPT). The basic idea is to generate a multitude of sequences via random point mutations.
Initially, we designed a Gaussian Process Regression model that would hopefully generate several predicted iterations of sequences based on the binding energies of the antibody to various spike protein mutants. This, however, did not work as the model could not generate properly mutated sequences based off of numerical values for binding energy. Several discussions with experts later, we came across the concept of the genetic algorithm, a computational simulation of natural selection.
A simple genetic algorithm was then designed, applying the random mutation aspect of the algorithm to several positions on the antibody sequence. This algorithm was implemented for both the heavy and light chain sequences, generating several newly mutated sequences. PDB files were then generated for these sequences, which were then tested on PyRosetta for binding with the spike proteins. After getting the REU values through rosetta, the dominant sequences, that can bind to the spike protein are kept in the population. A flow chart is also attached under which briefly describes the process. To reduce the time we need to spend on folding, a lot of optimization is conducted, which are described below:
Any sequence with more than 4 mutations is killed as too much mutation will greatly reduce the quality of the protein model. Only key mutations are kept. Distance between mutations are kept as large as possible to reduce interference between mutations When folding protein, a stability test is first conducted to determine what quality of the protein should be folded. Also, sequences with higher scoring is kept in majority so we don’t have to do duplicate mutations
Mutation scans are also conducted with the antibody and spike protein files to give insight on what amino acid site can be more researched and what spike protein mutation will cause problems. The heatmaps are analyzed in the results section. Finally, 3 light chain and heavy chain sequences are picked out of the population with the best soring in binding with different variants in the spike protein mutation scan. We uploaded the mas coding parts and composite parts for future igem teams to test on as a vaccine neutralizing agent against COVID. Our antibody is also used by NEGEM team in their project design.
Source
It comes from a mutated form of the wild-type heavy chain of the antibody that is able to bind to the spike protein more effectively.